Search CORE

102 research outputs found

A Neural Attention Model for Categorizing Patient Safety Events

Author: JL Elman
JR Clarke
L Soldaini
MA Makary
R Pascanu
S Hochreiter
Y Lecun
Publication venue
Publication date: 22/02/2017
Field of study

Medical errors are leading causes of death in the US and as such, prevention of these errors is paramount to promoting health care. Patient Safety Event reports are narratives describing potential adverse events to the patients and are important in identifying and preventing medical errors. We present a neural network architecture for identifying the type of safety events which is the first step in understanding these narratives. Our proposed model is based on a soft neural attention model to improve the effectiveness of encoding long sequences. Empirical results on two large-scale real-world datasets of patient safety reports demonstrate the effectiveness of our method with significant improvements over existing methods.Comment: ECIR 201

arXiv.org e-Print Archive

Crossref

An investigation of a deep learning based malware detection system

Author: David O. E.
Kolosnjaji Bojan
Lin Chih-Ta
Pascanu R.
Sahay Sanjay K.
Siddiqui Muazzam
Srivastava Nitish
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/09/2018
Field of study

We investigate a Deep Learning based system for malware detection. In the investigation, we experiment with different combination of Deep Learning architectures including Auto-Encoders, and Deep Neural Networks with varying layers over Malicia malware dataset on which earlier studies have obtained an accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these results were done using extensive man-made custom domain features and investing corresponding feature engineering and design efforts. In our proposed approach, besides improving the previous best results (99.21% accuracy and a False Positive Rate of 0.19%) indicates that Deep Learning based systems could deliver an effective defense against malware. Since it is good in automatically extracting higher conceptual features from the data, Deep Learning based systems could provide an effective, general and scalable mechanism for detection of existing and unknown malware.Comment: 13 Pages, 4 figure

arXiv.org e-Print Archive

Crossref

Exploiting Cognitive Structure for Adaptive Learning

Author: Chang H.-S.
Glorot X.
Huang Z.
Konda V. R.
Pascanu R.
Piech C.
Pinar W. F.
Su Y.
Tang X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2019
Field of study

Adaptive learning, also known as adaptive teaching, relies on learning path recommendation, which sequentially recommends personalized learning items (e.g., lectures, exercises) to satisfy the unique needs of each learner. Although it is well known that modeling the cognitive structure including knowledge level of learners and knowledge structure (e.g., the prerequisite relations) of learning items is important for learning path recommendation, existing methods for adaptive learning often separately focus on either knowledge levels of learners or knowledge structure of learning items. To fully exploit the multifaceted cognitive structure for learning path recommendation, we propose a Cognitive Structure Enhanced framework for Adaptive Learning, named CSEAL. By viewing path recommendation as a Markov Decision Process and applying an actor-critic algorithm, CSEAL can sequentially identify the right learning items to different learners. Specifically, we first utilize a recurrent neural network to trace the evolving knowledge levels of learners at each learning step. Then, we design a navigation algorithm on the knowledge structure to ensure the logicality of learning paths, which reduces the search space in the decision process. Finally, the actor-critic algorithm is used to determine what to learn next and whose parameters are dynamically updated along the learning path. Extensive experiments on real-world data demonstrate the effectiveness and robustness of CSEAL.Comment: Accepted by KDD 2019 Research Track. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD'19

arXiv.org e-Print Archive

Crossref

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

Author: Bastien F.
Clarke C. LA
El Hihi S.
Li X.
Mikolov T.
Pascanu R.
Shrivastava Anshumali
Sutskever I.
Publication venue
Publication date: 01/01/2015
Field of study

Users may strive to formulate an adequate textual query for their information need. Search engines assist the users by presenting query suggestions. To preserve the original search intent, suggestions should be context-aware and account for the previous queries issued by the user. Achieving context awareness is challenging due to data sparsity. We present a probabilistic suggestion model that is able to account for sequences of previous queries of arbitrary lengths. Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity. Additionally, our model can suggest for rare, or long-tail, queries. The produced suggestions are synthetic and are sampled one word at a time, using computationally cheap decoding techniques. This is in contrast to current synthetic suggestion models relying upon machine learning pipelines and hand-engineered feature sets. Results show that it outperforms existing context-aware approaches in a next query prediction setting. In addition to query suggestion, our model is general enough to be used in a variety of other applications.Comment: To appear in Conference of Information Knowledge and Management (CIKM) 201

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Powerpropagation: A sparsity inducing weight reparameterisation

Author: Jayakumar SM
Latham PE
Pascanu R
Schwarz J
Teh YW
Publication venue: NeurIPS
Publication date: 01/01/2021
Field of study

The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a “rich get richer” dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely. Powerpropagation is general, intuitive, cheap and straight-forward to implement and can readily be combined with various other techniques. To highlight its versatility, we explore it in two very different settings: Firstly, following a recent line of work, we investigate its effect on sparse training for resource-constrained settings. Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark. Secondly, we advocate the use of sparsity in overcoming catastrophic forgetting, where compressed representations allow accommodating a large number of tasks at fixed model capacity. In all cases our reparameterisation considerably increases the efficacy of the off-the-shelf methods

UCL Discovery

On the Role of Optimization in Double Descent: A Least Squares Study

Author: Kuzborskij I
Pascanu R
Rivasplata O
Szepesvári C
Triki AR
Publication venue: Advances in Neural Information Processing Systems
Publication date: 01/01/2021
Field of study

Empirically it has been observed that the performance of deep neural networks steadily improves as we increase model size, contradicting the classical view on overfitting and generalization. Recently, the double descent phenomena has been proposed to reconcile this observation with theory, suggesting that the test error has a second descent when the model becomes sufficiently overparameterized, as the model size itself acts as an implicit regularizer. In this paper we add to the growing body of work in this space, providing a careful study of learning dynamics as a function of model size for the least squares scenario. We show an excess risk bound for the gradient descent solution of the least squares objective. The bound depends on the smallest non-zero eigenvalue of the covariance matrix of the input features, via a functional form that has the double descent behavior. This gives a new perspective on the double descent curves reported in the literature. Our analysis of the excess risk allows to decouple the effect of optimization and generalization error. In particular, we find that in case of noiseless regression, double descent is explained solely by optimization-related quantities, which was missed in studies focusing on the Moore-Penrose pseudoinverse solution. We believe that our derivation provides an alternative view compared to existing work, shedding some light on a possible cause of this phenomena, at least in the considered least squares setting. We empirically explore if our predictions hold for neural networks, in particular whether the covariance of intermediary hidden activations has a similar behavior as the one predicted by our derivations

arXiv.org e-Print Archive

UCL Discovery

A convolution BiLSTM neural network model for Chinese event extraction

Author: A Graves
N Srivastava
R Pascanu
S Hochreiter
Y Bengio
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Chinese event extraction is a challenging task in information extraction. Previous approaches highly depend on sophisticated feature engineering and complicated natural language processing (NLP) tools. In this paper, we first come up with the language specific issue in Chinese event extraction, and then propose a convolution bidirectional LSTM neural network that combines LSTM and CNN to capture both sentence-level and lexical information without any hand-craft features. Experiments on ACE 2005 dataset show that our approaches can achieve competitive performances in both trigger labeling and argument role labeling

Crossref

Lancaster E-Prints

Dynamic Key-Value Memory Networks for Knowledge Tracing

Author: Chen T.
Corbett A. T.
Grefenstette E.
Joulin A.
Khajah M.
Krizhevsky A.
Maaten L. v. d.
Mikolov T.
Orr G. B.
Pardos Z. A.
Pascanu R.
Piech C.
Santoro A.
Sukhbaatar S.
Vinyals O.
Weston J.
Wilson K. H.
Xiong X.
Yudelson M. V.
Publication venue
Publication date: 17/02/2017
Field of study

Knowledge Tracing (KT) is a task of tracing evolving knowledge state of students with respect to one or more concepts as they engage in a sequence of learning activities. One important purpose of KT is to personalize the practice sequence to help students learn knowledge concepts efficiently. However, existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing either model knowledge state for each predefined concept separately or fail to pinpoint exactly which concepts a student is good at or unfamiliar with. To solve these problems, this work introduces a new model called Dynamic Key-Value Memory Networks (DKVMN) that can exploit the relationships between underlying concepts and directly output a student's mastery level of each concept. Unlike standard memory-augmented neural networks that facilitate a single memory matrix or two static memory matrices, our model has one static matrix called key, which stores the knowledge concepts and the other dynamic matrix called value, which stores and updates the mastery levels of corresponding concepts. Experiments show that our model consistently outperforms the state-of-the-art model in a range of KT datasets. Moreover, the DKVMN model can automatically discover underlying concepts of exercises typically performed by human annotations and depict the changing knowledge state of a student.Comment: To appear in 26th International Conference on World Wide Web (WWW), 201

arXiv.org e-Print Archive

Crossref

Visual Depth Mapping from Monocular Images using Recurrent Convolutional Neural Networks

Author: Alom M. Z.
Eigen D.
Heo Y. S.
Kingma D.
Lucas B. D.
Pascanu R.
Pinheiro P. H.
Saxena A.
Shah S.
Sperling G.
Vincent P.
Zbontar J.
Publication venue
Publication date: 10/12/2018
Field of study

A reliable sense-and-avoid system is critical to enabling safe autonomous operation of unmanned aircraft. Existing sense-and-avoid methods often require specialized sensors that are too large or power intensive for use on small unmanned vehicles. This paper presents a method to estimate object distances based on visual image sequences, allowing for the use of low-cost, on-board monocular cameras as simple collision avoidance sensors. We present a deep recurrent convolutional neural network and training method to generate depth maps from video sequences. Our network is trained using simulated camera and depth data generated with Microsoft's AirSim simulator. Empirically, we show that our model achieves superior performance compared to models generated using prior methods.We further demonstrate that the method can be used for sense-and-avoid of obstacles in simulation

arXiv.org e-Print Archive

Crossref

Overcoming catastrophic forgetting in neural networks

Author: Clopath C
Desjardins G
Grabska-Barwinska A
Hadsell R
Hassabis D
Kirkpatricka J
Kumaran D
Milan K
Pascanu R
Quan J
Rabinowitz N
Ramalho T
Rusu AA
Veness J
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 25/01/2017
Field of study

The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Until now neural networks have not been capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks that they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on a hand-written digit dataset and by learning several Atari 2600 games sequentially

arXiv.org e-Print Archive

PubMed Central

Spiral - Imperial College Digital Repository